Configure MetaSpore to Access S3
MetaSpore supports S3 as storage to read sample data and write model output.
To configure S3 access, there are two methods as below.
1. Access S3 on AWS EC2
AWS EC2 supports use IAM Role credential and there's no extra config needed.
Note:In AWS China regions,you need to add an env AWS_REGION=cn-north-1,otherwise AWS SDK looks up buckets outside China by default. Export the env befor executing MetaSpore training:
export AWS_REGION=cn-north-1
To run distributed MetaSpore training on Spark, you need to set the env to all executors:
spark_session = SparkSession.builder
.config('spark.executorEnv.AWS_REGION', 'cn-north-1')
.getOrCreate()
2. Access S3 Outside AWS EC2
In this case you need to set AWS_ACCESS_KEY_ID、AWS_SECRET_ACCESS_KEY envs. For AWS S3 compatible services, e.g. OSS, OBS, COS, Minio, etc., AWS_ENDPOINT env is also required to be set:
export AWS_ENDPOINT=<end point url>
export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your access key>
Set spark.executorEnv.AWS_*
for distributed Spark jobs as above mentioned.
For endpoint urls, refer to your cloud service providers' docs:
https://www.alibabacloud.com/help/en/object-storage-service/latest/regions-and-endpoints
https://intl.cloud.tencent.com/document/product/436/6224?lang=en&pg=